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Abstract 



Hastings recently provided a proof of the existence of channels which violate the 
additivity conjecture for minimal output entropy. In this paper we present an expanded 
version of Hastings' proof. In addition to a careful elucidation of the details of the proof, we 
also present bounds for the minimal dimensions needed to obtain a counterexample. 
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1 The additivity conjectures 

The classical capacity of a quantum channel is the maximum rate at which classical information 
can be reliably transmitted through the channel. This maximum rate is approached asymptotically 
with multiple channel uses by encoding the classical information in quantum states which can be 
reliably distinguished by measurements at the output. In general, in order to achieve optimal 
performance, it is necessary to use measurements which are entangled across the multiple channel 
outputs. However it was conjectured that product input states are sufficient to achieve the maximal 
rate of transmission, in other words that there is no benefit in using entangled states to encode the 
classical information. This conjecture is closely related to other additivity conjectures of quantum 
information theory, as will be explained below. Recently Hastings disproved all of these 
additivity conjectures by proving the existence of channels which violate the additivity of minimal 
output entropy. The purpose of this paper is to present in detail the findings of Hastings' paper, 
and also to find bounds for the minimal dimensions needed for this type of counterexample. 

We begin by formulating the various additivity conjectures. The Holevo capacity of a quantum 
channel $ is defined by 

X*m = sup 5($(5^p,p,:)) (1-1) 

where the supremum runs over ensembles of input states, and where S{p) denotes the von Neumann 
entropy of the state p: 

S{p) = -Tip log p (1.2) 
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The classical information capacity is known fi5[ [25] to equal the following limit: 

C($) = lim (1.3) 

n^oo n 

It has been a longstanding conjecture that the classical information capacity is in fact equal to the 
Holevo capacity: 



Conjecture 1 C($) = x*i^) (1-4) 



Conjecture 1 would be implied by additivity of x* over tensor products. This led to the following 
conjecture: for all channels $ and Q, 



Conjecture 2 x*i^ (8)^) = x*($) + X*(^) (1-5) 



Subsequently a third conjecture appeared, namely the additivity of minimum output entropy: 



Conjecture 3 | S^nmi^ ® ^) = S,nin{^) + 5'mm(fi) (1-6) 

where 5*111111 is defined by 

^iiiiii($) = inf 5($(p)) (1.7) 
p 

Finally Amosov, Holevo and Werner [3J proposed a generalization of Conjecture 4 with von Neu- 
mann entropy replaced by the Renyi entropy: for all p > 1 



Conjecture 4 Sp^rnmi^ (S)Vl) = S'p,min($) + 5'p,miti(r^) 



where Sp^^nin is the minimal Renyi entropy defined for p 7^ 1 by 

5'p,iiiiii(<l>) = inf Sp{<l>{p)), Sp{r) = logTrr^ (1.9) 

p 1 — p 

In 2004 Shor [27] proved the equivalence of several additivity conjectures, including Conjectures 
2 and 3 above. In subsequent work [TTJ it was shown that Conjectures 1 and 2 are equivalent. The 
conjectures have been proved in several special cases [H [21 El El [13 [TSl [ISl but recently most 
progress has been made in the search for counterexamples. This started with the Holevo- Werner 
channel [28j which provided a counterexample to Conjecture 4 with p > 4.79, then more recently 
Winter and Hayden found counterexamples to Conjecture 4 for all p > 1 [l4j, and violations have 
since been found also for p = and p close to zero Finally in 2008, Hastings [l2] produced a 
family of channels which violate Conjecture 3, namely additivity of minimal output von Neumann 
entropy, thereby also proving (via [27] and [TT]) that Conjectures 1 and 2 are false. 

The product channels introduced by Hastings have the form $ (g) $ where $ is a special channel 
which we call a random unitary channel. This means that there are positive numbers Wi, . . . ,Wd 
with Wi = 1 and unitary n x n matrices Ui, . . . ,Ud such that 

d d 

$(p) = ^^.t/.p[/;, ^p) = Y,wMpT7f (1.10) 

i=l i=l 
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These channels are chosen randomly using a distribution that depends on the two integers n and d, 
where n is the dimension of the input space and d is the dimension of the environment. Hastings' 
main result is that for n and d sufficiently large there are random unitary channels which violate 
Conjecture 3, that is 



This result also allows a direct construction of channels which violate Conjectures 1 and 2, as 
we now show. Using results from the paper [T^, the inequality fll.lll) implies that the additivity 
of minimal output entropy does not hold for the product $' ® where = In addition, 

as shown in the paper [iQ\, there is a unital extension of denoted such that the additivity 
of minimal output entropy does not hold for $" ® and such that 



where D is the dimension of the output space for Thus $" provides a counterexample for 
Conjecture 2, and 



Therefore, the classical capacity of $" does not equal its Holevo capacity, and this provides a 
counterexample for Conjecture 1. 

One key ingredient in the proof is the relative sizes of dimensions, namely n » d » 1, where 
n is the dimension of the input space, and d is the dimension of the environment. Recall that 
in the Stinespring representation a channel is viewed as a partial isometry from the input space 
Tiin to the product of output and environment spaces Hout ® 'Henv, followed by a partial trace 
over the environment. The image of Tim under the partial isometry is a subspace of dimension 
n sitting in the product Tiout ® 'Henv Making the environment dimension d much smaller than 
the input dimension n should guarantee that with high probability this subspace will consist of 
almost maximally entangled states. For such states the output entropy will be close to the maximal 
possible value logci, and therefore the minimal entropy of the channel should also (hopefully) be 
close to log d. At the same time the product channel $ ® $ sends the maximally entangled state 
into an output with one relatively large eigenvalue, and thus one might hope to find a gap between 
5'min('^' ® $) and 5'inin($) + S'min($). Tumiug this vague notion into a proof requires considerable 
insight and ingenuity. In this paper we focus on the technical aspects of Hastings' proof. Some of 
the estimates and inequalities derived in this paper are new, but all the main ideas and methods 
are taken from [T2] . 

The paper is organized as follows. In Section 2 we define notation and make a precise statement 
of Hastings' results. In Section 3 we present some background material on probability distributions 
for states and channels. In Section 4 we 'walk through' the proof of Hastings' Theorem, stating 
results where needed and delineating the logic of the argument. In Section 5 we give the proofs 
of various results needed in Section 4 and elsewhere. Section 6 discusses different aspects of the 
proof and possible directions for further research. The Appendix contains the derivation of some 
estimates needed for the proof. 



(1.11) 



S^i,($" ® $") = 21ogD - x*i^" ® $")• 



(1.12) 



hm x^ii^T'") > X*m. 

fe— >oo ZK 



(1.13) 
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2 Notation and statement of results 

We will mostly avoid Dirac bra and ket notation, although it will be used in Sections 15.11 and 15.51 



2.1 Notation 

Let A4n denote the algebra of complex n x n matrices. The identity matrix will be denoted /„, or 
just /. The set of states in is defined as 

Sn = {peMn ■■ P = P*>0, Tip = 1} (2.1) 

The set of unit vectors in C" will be denoted 

n 

Vn = {z = {zi, Znf G C" : z*z = Y, = 1} (2-2) 

i=l 

Every unit vector z EVn defines a pure state p = zz* satisfying = p. The set of unit vectors 
Vn is identified with the real (2n — l)-dimensional sphere S"^""^, and hence carries a unique uniform 
probability measure which we denote cr„. 

The set of unitary matrices in is denoted 

U{n) = {U e Mn : UU* = 1} (2.3) 

We will write if„ for the normalized Haar measure on U{n). 

A channel is a completely positive trace-preserving map $ : M.n — ^ ■M.m- Recall the definition 
of random unitary channel fll.lOp : 

d 

^p) = J2^,U,pU* (2.4) 

i=l 

The set of all random unitary channels on with d summands will be denoted TZd{n). Given a 
channel $ G TZd{n) the complementary or conjugate channel : M.n M.d is defined by |16j . 
1221 



d 

$^(p) = ^ Tr(pf/;t/,) (2.5) 

As is well-known, for any input state p the output states $(p) and $'^(p) are related by 

$(p) = Tr2 WpW\ ^^{p) = Tri WpW* (2.6) 

Here, W : ^ C"'^ is a partial isometry. Also Tr2 denotes the partial trace over the state space 
of the environment, and Tri denotes the partial trace over the state space of the system. When 
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p = zz* is a pure state, the matrices ^{zz*) and ^'"{zz*) are partial traces of the same pure state, 
and thus have the same non-zero spectrum and the same entropy. Therefore S'min($) = Sjam{^'~^)- 
For the purposes of constructing the counterexample it is convenient to work with both $ and 
In particular, we are interested in the cases where W consists of rescaled unitary block matrices; 



W 



(2.7) 



Note that Wi = 1 a,s W is a, partial isometry. We define a measure on this subset of partial 
isometries, in Section [3^ as the product of Haar measures and a particular measure on the simplex. 

The complex conjugate channel $ is defined by 

d d 

$(p) = Y.w, UlpU* = Y,w, U.pUl (2.8) 

i=l i=l 

Again note that $ and $ have identical minimum output entropies. 
2.2 The main result 

Following the work of Winter and Hayden [H], the counterexample is taken to be a product channel 
of the form $(8)<l> where $ is a random unitary channel. Hastings first proves the following universal 
upper bound for the minimum output entropy of such a product. 



Lemma 1 For any $ G TZdin) 



5^in($®$) <21ogrf-i^ (2.9) 



Lemma [T] will be proved in Section 15. 1[ The counterexample is found by proving the existence 
of a random unitary channel $ whose minimum output entropy is greater than one half of this 
upper bound, that is greater than logc? — logd/2d. For such a channel it will follow that 

5min($®$) < 21ogrf-^ (2.10) 
< 2S^U^) (2.11) 

= S,ni„($) + 5min($) (2.12) 

and this will provide the counterexample to Conjecture 3. Hastings proved the existence of such 
channels using a combination of probabilistic arguments and estimates involving the distribution 
of the reduced density matrix of a random pure state. The next Theorem is a precise statement of 
Hastings' result. 
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Theorem 2 There is hmm < oo, such that for all h > /imin, o-H d satisfying dlogd > h, and all n 
sufficiently large, there zs $ G Tidin) satisfying 

>logt/-^ (2.13) 

By taking d large enough so that 2/iinin < logci, we deduce from Theorem [2] that there is a 
channel $ satisfying 

S^^{^)>\ogd-^^ (2.14) 

and this establishes the existence of counterexamples for Conjecture 3. In fact the proof will show 
that as c?, n — >■ cx), the probability that a randomly chosen channel in TZd{n) will satisfy the bound 
( I2.13P approaches one. 

It would be interesting to determine the set of integers (n, d) for which there are random unitary 
channels in TZd{n) violating additivity, and in particular to find the smallest dimensions which allow 
violations, as well as the size of the largest possible violation. Following this line of reasoning we 
define 

dmin = inf |d : 3n, 3 $ G 7^d(r^) s.t. 5mm($) > logd - 
n^in = inf |n : 3 c/, 3$ G 7^d(^^) s.t. S^i^i^) >\ogd- 

A-Smax = sup sup (^„,in($) +5inin($) -5mm($®$)) (2.15) 

The next result gives some bounds on these quantities. 
Proposition 3 

<in < 3.9 X 10^ 
nrnin < 7.8 X 10^^ 

A^max > 9.5 X 10"^ 

Proposition [3] will be proved in Section I^l6l The bounds in Proposition [3] are surely not optimal, 
however they may indicate the delicacy of the non-additivity effect for this class of channels. It 
would certainly be interesting to tune the estimates in this paper in order to improve the bounds 
in Proposition [31 or even better to find a different class of channels where the effect is larger. 



3 Background on random states and channels 

As mentioned above, the proof of Theorem [2] relies on probabilistic arguments, involving distri- 
butions of pure states and random unitary channels. The next sections explain the distributions 
which play a role in the proof. 
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3.1 Probability distributions for states 

Recall that Vn is the set of unit vectors in C". This set carries a natural uniform measure cr„, 
namely the uniform measure on the (real) {2n — l)-dimensional sphere. If C^^ = (S) C" is a 
product space, then a unit vector z G Vdn can be written as a x d matrix M, with entries 

Mij{z) = Z(i-i)d+j, i = 1, . . .n, j = 1, . . . ,d (3.1) 

satisfying TyM*M = = 1. Define the map G : Vdn Mdhj 

G{z) = M{zyM{z) (3.2) 

It follows that G{z) > and Tr G{z) = 1, and hence the image of G lies in Sd (the set of d- 
dimensional states). Since 2; is a random vector (with distribution adn) it follows that G{z) is a 
iSrf-valued random variable, or more simply a random state. Its distribution has been studied in 
many other contexts (see for example Ll3j) and it plays a key role in the proof here. 

3.2 Probability distributions on the simplex 

Let Ad denote the simplex of (i-dimensional probability distributions: 

d 

Ad = {{xi,...,Xd) CM'^ : >0, = 1} (3.3) 

i=l 

We define below three different probability distributions on Ad- One is the uniform measure 
inherited from M"^, and the others are defined by the diagonal entries and the eigenvalues of G{z) 
where 2 is a random unit vector in Vrf„. 



Uniform distribution The simplex A^^ carries a natural measure inherited from Lebesgue measure 



on R"': this is conveniently written as 

d 



6 ( Wi — l\ dwi . . . dwd = S ( Wi — l \ [d 



w] (3.4) 



where 6{-) is the Dirac 5-function. Integrals with respect to this measure can be evaluated by 
introducing local coordinates on M'^ in a neighborhood of A^. In particular the volume of Ad with 
respect to the measure (13. 4p can be computed: 

lip:-^)lM-jj^^ (3.5) 



Diagonal distribution Ud^n Let z G Vdn be a random unit vector in C" (8)C . The joint distribution 



of the diagonal entries {Gii{z), . . . , Gdd{z)) will be denoted i'd,n- It is possible to find an explicit 
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formula for the density of z/d,„, however we will not need it in this paper. It is sufficient to note 
that a collection of d random variables Yi, . . . ,Y(i have the joint distribution Vd^n if and only if they 
can be written as 



Zij\ , 



(3.6) 



i=l 



where {%} are the components of a uniform random vector on the unit sphere in C" ® C^. 
come back to this problem in Section 15.31 



We 



Eigenvalue distribution fid,n As noted above the eigenvalues of G{z) are non-negative and sum to 



one. ^ However the eigenvalues are not ordered and so define a map not into but rather into 
the quotient Af^/E^ where is the symmetric group. Thus when z G Vdn is a random vector the 
eigenvalues of G{z) are A^/E^-valued random variables. However it is convenient to use a joint 
density for the eigenvalue distribution on A^, with the understanding that it should be evaluated 
only on events which are invariant under E^. This density is known explicitly [2T], [29j: for any 
event A C Ad 



/id,n(^) = Z{n,d) ^ 



n 



[Wi - Wj) 



l<i<j<d 



i=l ^ i=l ^ 



(3.7) 



where Zin^ d) is a normalization factor. The distribution n^^n plays an essential role in the proof of 
Theorem [21 Explicit expressions for Z{n, d) are known [29]. In Appendix A we derive the following 
bound: for n sufficiently large, 



Z{n,d)-' < n'^'rf'^^"-'^) 



(3.8) 



3.3 Estimates for Hd,? 
Define the function 



F{x) = — log X + X — 1 



Lemma 4 For all d, for n sufficiently large, and for any event A C Ad, 



(3.9) 



fJ'dA^) < / logn - (n - rf) ^ F{dwi) 5i ^ - 1 j [dw] 



i=l 



i=l 



(3.10) 



This Lemma will be proved in Section [521 Using (13.51) we immediately get the following bound. 



^G{z) gives the complex Wishart matrix when z e 



with each entry Zij being IID complex normal 



distribution. The eigenvalue distribution was shown to be proportional to Hi 
example, in [5]. 



<i<j<d\^i 
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Corollary 5 For all d, for n sufficiently large, and for any event A C A^, 

d 

^d,n{A) < exp d^ \ogn — \og{d — — {n — d) inf F{dwi 



(3.11) 



Note that F{x) is convex, and also F{1) = F'{1) = 0. The Taylor expansion around 1 gives 

1 



F{l + d6w) = -d\6wy + R 



where the remainder is 



R 



^ (1 + d6)-^ id6wf. 



(3.12) 



(3.13) 



and S is some value between and Sw. Note that —1/d < 6w < {d — l)/d as < w < 1. Also, 
R> a 5w <0. When 6w > 



0> R> --d%6w?. 
6 



(3.14) 



Recall that F{x) > 0, so we have the bound F{dwi) > for all i. Thus feeding (13.121) into 
Corollary [5] gives the following estimate, which will be used in Section [5^ 



Corollary 6 For all d, for n sufficiently large, and for any i = 1, . . . ,d, 

d^ logn - \og{d - 1)! - ^^dH^ + "^dH^ 

2 6 

3.4 Probability distribution for random unitary channels 





1 




|w : 


d 


> t| < exp 



(3.15) 



A random unitary channel fll.lUp is determined by the coefficients Wi and the unitary matrices f/j. 
Thus the set of random unitary channels TZd{n) is naturally identified with x U{nY. Recall the 
distribution Ud^n defined in Section [3l2] for the diagonal entries of G{z), and the Haar measure Hn 
defined on U{n). We define the following product probability measure on TZdiji): 



Vd,n = Ud,nX HnX ■■■ X H„ 



(3.16) 



where iJ„ x ■ ■ ■ x Hn is the (i-fold product Haar measure on U{nY. Using the measure Vd,n on 
TZd{n) means that the unitaries Ui are selected randomly and independently, while the coefficients Wj 
have the joint distribution z/d^„, and thus can be written in the form (13. 6p where {%} {i = I, . . . ,n; 
j = 1, . . . ,d) are the components of a random unit vector in Vnd- 

Recall the definition (12.51) of the conjugate channel. Define the map 

H : TZdin) x V„ ^ Md, ($, z) ^ ^^{zz*) (3.17) 

Recall the definition (13.21) of the map G : Vdn M.d- The following relation between the distri- 
butions Vd,n) c"n and adn is crucial to the proof. 
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Lemma 7 



H*{Vd,n X an) = G*{adn) 



(3.18) 



Lemma [7] will be proved in Section [531 It implies that if $ is chosen randomly according to the 
measure Vd.n and z is chosen randomly and uniformly in V„, then the eigenvalues of the matrix 
^'^{zz*) will have the distribution /id,„. 



The main idea of the proof is to isolate some properties of random unitary channels which are 
typical for large values n and d. These properties will then be used to prove that large minimum 
output entropy is also typical for random unitary channels when n, d are large. 

Recall that the environment dimension d will be chosen to be much smaller than the input 
dimension n. As the identity fl2.6p shows, selecting a channel in TZd{n) corresponds to selecting 
a subspace of dimension n in the product space C"^ ® C"^. The structure of random bipartite 
subspaces was analyzed in the paper [13], and it was shown that in some circumstances most states 
in a randomly selected subspace will be close to maximally entangled. In such a situation the 
reduced density matrix of a randomly selected output state $'^(z2;*) will be close to the maximally 
mixed state I/d, and hence its entropy will be close to \ogd. Although this observation plays an 
essential role in Hastings' proof, the methods used in do not directly yield the bounds needed. 

4.1 Definition of the typical channel 

A channel $ will be called typical if maps at least one half of input states into a small ball 
centered at the maximally mixed output state I /d. The size of the small ball in question involves 
a numerical parameter h and is defined as follows: 



Definition 8 A random unitary channel $ is called typical if with probability at least 1/2 a ran- 
domly chosen input state is mapped by into the set Bd{n). The set of typical channels is denoted 



4 Proof of Theorem [2 




(4.1) 



T: 




(4.2) 



As the next result shows, for large n most channels are typical. 



Lemma 9 For every h > y/S there is a > such that for n sufficiently large, and for all d 



'Pd,n{T'') < ,^ _ exp[-arf^ \ogn] 



(4.3) 
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Thus if 6 > -\/3, then as n — oo with high probabihty a randomly chosen channel will lie in the 
set T. In particular Vd,n{T^) < 1 for n sufficiently large. The number a can be chosen to satisfy 

a = —- ^ - 1 (4.4) 

671 

The dimension n must be large enough so that the right side of (14.41) is positive, and also so that 
n > b'^(P logn (this is a technical condition needed in the proof, see Section [5^ . 

The second property of a typical channel ^ is the existence of a 'tube' of output states sur- 
rounding $*-^(zz*) for every input state z EVn- This property is used to eliminate the possibility 
of isolated output states with low entropy: if for some z the output entropy S{(^^ {zz*)) is small, 
then there is a nonzero fraction of input states whose outputs also have low entropy. In order to 
define the tube we ffist construct a line segment Y{p) pointing from a general state p toward the 
maximally mixed state I /d. The length of the segment depends on a parameter 7, which satisfies 
< 7 < 1: 

F(p) = |rp+ (1 -r)i/ : 7 < r < l| (4.5) 

The tube at p is defined to be the set of states which lie within a small distance of the set 
y(p), and thus form a thickened line segment pointing from p toward the maximally mixed state. 
The definition of 'small' here depends on the size of the ball Bd{n), and also on another numerical 
parameter t. 

Definition 10 Let p G Sd, then the Tube at p is defined as 



Tube(p) = G 5d : rfist(^,r(p)) < tW^^^^l, dist(e,Y(p)) = inf 11^ - r||oo (4.6) 
t \ n } rey(p) 

The next result shows that for a channel $ in the typical set T, and for any state p = ^^{zz*) in 
the image of there is a uniform lower bound for the probability that a randomly chosen state 
belongs to the tube at p. As explained before, this means that an output state ^'-''{zz*) cannot be 
too isolated from the other output states. 

Lemma 11 For all d > 3 there is f3 > such that for n sufficiently large, for all t > b + 4, and 
for all^ eT and p G Im($'^), 

(\ n—l 
1-7) (4.7) 

Lemma [11] will be proved in Section 15. 5[ The number (3 is given by the following expression: 

/3=i-(<i^ + 2)(l-^J^)""' (4.8) 
It can be easily seen that for all > 3 the right side of (14. 8 p is positive for n sufficiently large. 
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4.2 Definition of the low-entropy events E 

Define the set of cliannels wliose minimum output entropy does not satisfy our requirements for a 
violation: 

Cd,n = 1$ G 7^,(n) : < logd- ^1 (4.9) 

Tlie goal is to show that for d, n and h sufficiently large we have Vd,n{Cd,n) < 1, implying that 
Vd,n{C^^ > 0, and thus that there exist random unitary channels with S'min($) > logc? — h/d. 
The proof will hold for all h,d sufficiently large, and thus by taking logrf > 2h this will provide 
a counterexample to additivity. The method is to find useful upper and lower bounds for the 
probability of a particular event E in TZd{n) x V^. The event E is chosen to contain all the pairs 
($, z) where ^^{zz*) lies in a tube connected to a state of low entropy. This set of tubes is defined 
by 



J = |j{Tube(p) : Sip)<\ogd-^ 



(4.10) 



Then the main event of interest for us is the following subset of TZdi'^) x V„: 

E = {($, z) : ^^{zz*) eJ} = H-\J) (4.11) 

where H is the map defined in (13.171) . The proof will proceed by proving upper and lower bounds 
for the probability of E, that is {Vd,n x CFn){,E). These bounds will hold for any < 7 < 1; the 
parameter 7 will be 'tuned' at the end in order to derive an estimate for the minimal size /imin 
needed for the counterexample. As noted the construction works for any values of the parameters 
6, t satisfying h > \^ and t > b + A. The sizes of b and t do not play a crucial role, and they can 
be set to the values 6 = 2 and t = 6 without changing anything in the proof. 

4.3 The upper bound for Proh{E) 
Note that by Lemma [TJ 

{Vd,n^(Tn){E) = {Vd,n^(Tn){H-\j)) 

= H*{Vd,n^cyn){J) = G*{adn){J) (4.12) 

Let p be a fixed state in the set of tubes J. Then by definition there is a state t E Sd with low 
entropy such that p lies in the tube at r. Thus for some r satisfying 7 < r < 1 



p - ( rr + (1 - r)^/ 



00 



<t^/^, ^(r)<logrf-^ (4.13) 
n d 



Letting qi.Pi denote the eigenvalues of p, r respectively, it follows that 



qi = rpi + {l-r)^ + ei, i = l,...,d (4.14) 
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where pi, ej satisfy 



^Pilogpi <\ogd- -, ^ei = 



i=l 



Weyl's inequality and fl4.13p imply that 



Ie,;| < t 



dlogn 



n 



The entropy condition fl4.15p can be written as 



d d y \ 

^Pid log{pid) = ^ipid log{pid) -pid+lj >h 

1=1 i=l ^ ^ 



Define the function 



Lemma 12 



f{x) = X \ogx — X + 1 



sup 



/(O) 



1 



x>0, 



,5.<i /(rx + l-r) /(I -7) /(I -7) 



(4.15) 



(4.16) 



(4.17) 



(4.18) 



(4.19) 



Lemma [12] will be proved in Section 15. 6[ Recall (14.141) and define 

1 



Zi = qi-ei= rpi + (1 - r) 
Then Lemma [12] implies that for each i = 1, . . . ,d, 



Pid log{pid) -pid + l = f{pid) < 



d 



1 



/(I -7) 



f{z^d) 



Therefore from fl4.17p it follows that 



d / \ d 

^ ( Zid \og{zid) - Zid+l \ = ^ f{zid) > hf{l - 7) 

i=l ^ ^ i=l 



(4.20) 



(4.21) 



(4.22) 



We will use the Fannes inequality [8], [1] to bound the difference between the entropies of Zi 
and qf. 



^ Zi log Zi + ^qi log qi < (log + log — 



(4.23) 
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where 



E 



Zi - Qi 



i=l 



dlogn 



n 



Define 



r] = de^ (log + log ■ 



(4.24) 



(4.25) 



Note that for all d and t, — > as n — > oo, and hence also 77 — >■ as n — > cxd. From (14.23^ and 
IKTI\i we deduce 

d 

Y,fM>hf{l-l)-Tl (4.26) 



i=l 



To summarize what we have shown so far: if p G J has eigenvalues (gi, . . . , g^) then (14.261) 
holds. Thus we may upper bound the probability (14.121) by the probability of the state p satisfying 
the inequality (I4.26p . Since this event depends only on the eigenvalues of p, we obtain 



G*(a,„)(J) </i,,Jg : /(g.rf) > /i/(l - 7) - r^j 

i=i ^ 



(4.27) 



This probability is estimated using the bound (13. lip : given a positive number x < dlogd, define 

d d 

M4x) = mi {TfM : V/(g.ci) > x] (4.28) 

1=1 1=1 

where F{x) = — logx + a; — 1 as defined in (13.90 . Then from (14.270 and (13. lip we deduce 

iVd,n X a„)iE) = G*(a,„)(J) < exp \d^ logn - log(rf - 1)! - (n - d)Mrf(/i/(l - 7) - 77)] (4.29) 

The next Lemma gives a lower bound for Md{x) which is not optimal but is sufficient for our 
purposes. 

Lemma 13 The function Md{x) is increasing. Suppose that 2e^ < a; < dlogd. Then 

Md{x) > log{x - 1) - log(2e2 - 1) 
Lemma [13] will be proved in Section 15. 7[ Applying (I4.30p to (14.290 gives 

-/i/(l-7)-r/-l 



(4.30) 



{Vd,n X an)iE) < exp 



d logn — log{d — 1)\ — {n — d) log 



2e2 - 1 



where h is assumed to satisfy the bounds 

2e^ < /i/(l -7) - v< dlogd 



(4.31) 



(4.32) 
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4.4 The lower bound for Prob{E) 
First we write 

{Vd,n X an){E) = E4an{z : ^^(zz*) E J)] 

> E^lc^^nTCTniz : ^""(zz*) G J)] 

where E$ denotes expectation over TZdiji) with respect to the measure Vd,n, and Ic^^nT is the 
characteristic function of the event Cd^n H T. Given that $ G C^^^ there is a state f G C" such that 

S{^^{vv*)) <\ogd-^ (4.33) 

Since Tube($^(f t;*)) C J it follows that 

{Vd,n X > E<i.[lc,,„nTCT„(z : $^(^2*) G Tube(<l>^(t;t;*)))] (4.34) 

Applying Lemma [TT] to fl4.34p gives 

(Pd,n X (T„)(E) > ^[l-l^ E$[lc,,„nT] (4.35) 

= /?(l-7) Vd,n{Cd,n^T) (4.36) 

> /?(^l-7) (Pd,n(Cd,„)-Pd,„(n) (4.37) 

4.5 Combining the bounds for Proh{E) and finishing the proof 

Putting together the upper and lower bounds for {Vd,n x cr„)(-E) and using Lemma [9] produces the 
following bound: for all (i > 3, for all h > and t > 6 + 4, for all < 7 < 1, for h, d satisfying 
f l4.32p . and for n sufficiently large 

1 / 1 ^ n-l 



VdACd,n) < '^dATl + ^[j^] (Pd,„xa„)(E) 



2, d 1 / 1 \ 

< — — exp[— ad^ logn] + — I ) exp[d^ \ogn — \og{d — 1)! — {n — d) log/i] 

[d- l)\ (3 VI -7/ 



2^ r ^2, 1 

exp[— ad lognj 



{d-l)\ 



1 _ ^ 

+-r73 — —- exp[(i^logn + d\ogh - nlog(l - 7)/^] (4.38) 
p\a ij. 



where h = {hf{l — 7) — — l)/(2e^ — 1). Define 



h . = ^"'-^ (4 39) 

(l-7)/(l-7) ^ 
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(note that /imin satisfies the lower bound in (14.321) ). As n — cxo the parameter t] approaches zero, 
and therefore for h > h^i^ the second term on the right side of (14.381) is controlled by the factor 



exp 



n\og 



(l-7)(/^/(l-7)-l) 
2e2 - 1 



hf{l-l) 



hminfi'^ - 7) 



(4.40) 



The first factor on the right side of (I4.38P approaches zero a.s n —>■ 00, therefore (14.401) implies that 

for h > /imin, 



'Pd,n{Cd,r. 



as n ^ 00 



(4.41) 



Summary and conclusion We have shown that for any < 7 < 1, for h > /imin as defined in 
(I4.39p . for any b > VS and t > b + A, for any d > 3 satisfying dlogd > /i/(l — 7) (this comes from 
the second inequality in (I4.32p ). there is < 00 such that for all n > iV we have Vd,n{Cd,n) < 1- 
In this case we also have Vd,n{C'^ri) > 0; thus a guarantee that the set is non-empty. 
Referring to (14.91) . this means that there exists a random unitary channel $ such that 

5min(«') >l0grf-^ (4.42) 



4.6 Optimizing the bounds for Proh{E) and the proof of Proposition [3] 

First consider the value /imin defined in (14.391) . Varying 7 shows that the right side achieves its 
minimum value at 7 = 0.72. In order to achieve a counterexample we need logc? > 2/i, so this 
implies the existence of counterexamples for all d > do with 

do = exp[2/ii^in + 1] ~ exp [276] (4.43) 



In order to get a better estimate of cimin, we return to the bound (I4.29P and look for the smallest 
value of d satisfying 

Md log d) + log(l - 7) > (4.44) 

For n sufficiently large this will yield a counterexample. This is a straightforward numerical prob- 
lem: for each 7 we find the smallest d so that 

d - z 

-log(l-7)< inf{-logz- (rf- l)log- — -: 
z>i a — i 

z log z + {d- z) log ^ = ^^^'^^ log d} (4.45) 

and then minimize over 7. The solution occurs at 7 = 0.762 and yields d^ = 38578. This also 
proves the first statement in Proposition [31 
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For the second statement we estimate the smallest value of n which yields Vd,n{Cd,n) < 1- Using 
the values 6 = 2, t = 6, 7 = 0.762, and with d = 50, 000, crude numerical estimates show that we 
can achieve this with n = d^ . This proves the second statement in Proposition [31 

For the third statement, we note from Lemma [1] that for any random unitary channel $ 

A^^ax > 2S^nm - 21ogrf+ ^ (4.46) 

d 



Thus for every $ G we have 



A^^ax > (4.47) 



For a fixed value h, the right side of f l4.47p achieves its maximum value when d = [exp{2h + 1)], 
and this maximum value is 1/d. Numerical calculation shows that we can achieve Md{f{l — 'y)h) + 
log(l - 7) > using the values 7 = 0.762, h = log(38590)/2 and d = [exp{2h + 1)], and then 1/d 
yields the lower bound for AS'max stated in Proposition [31 

5 Proofs of Lemmas 
5.1 Proof of Lemma [H 

First, note that for any unit vectors {I'?/'*;)} and probability distribution {pk}, 



S {^Pk\i^k){i^k\j < -^Pk^ogpk. (5.1) 

\ k / k 



Let lijj) be the maximally entangled state. Then 

d 

($ ® m^){^\) = Yl ® ® uf (5.2) 

\i^){i^\+J2'^^w,u,®u]\iJ){^\u:(»uJ (5.3) 

where we used the identity Ui ® Ui\-ip) {ip\U* ® = \ip){ip\ for all i. Hence, 

s (($ ® $)(iv^)(^i)) < - (e^H (E^H - (5-4) 

\i=l J \i=l / i^j 

Write p = X]f=i '^i then X^j^j u)iWj = 1 — p. Hence 

|. d'^-d d'^-d s 

5(^(<1>® $)(|^)(V^|)) < -plogp + sup<^ - ^t;A,.logt;fc : > 0, ^Vk = l-p\ (5.5) 

k=l k=l ^ 
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The supremum on the right side of (15. 5p is achieved with Vk = (1 —p)l{,d? — d) for all k, hence 

S (($ ® < h{p) = -plogp- (1 -p)log , (5.6) 

where 1/d < p < 1. However, 

h'{p) = - logp + log(l - p) - \og{d^ -d) = - log(pd) + log (^J^^ < (5-7) 

for all d. This implies that the above upper bound h{p) is maximized when p = 1/d and the 
maximum is 

^logci + ( 1-^ ) log(ci2) = 21ogci- ^logd. (5.8) 
d \ d J d 

5.2 Proof of Lemma [4] 

By dropping the terms {wi — wj)^ in (13.71) we get 

d / d V 

iw] (5.9) 



/id,n(A) < Z{n,d)-' / n<"'n - 1 ) 



Applying (^M> to ([53]) leads to 

/^d,„(^) < J exp[rf2logn + {n-d) ^log(rfwi)] ^^^^Wi - ij [dw] (5.10) 

Noting that 

d d 

F{dw,) = -J2 ^og{dw,) (5.11) 

i=l i=l 

the result follows. 

5.3 Proof of Lemma \7\ 

For any event A G Aid, 

= j da^{z)Vd,n{^ : ^^{zz*) G (5.12) 

A random unitary channel $ is determined by the coefficients {wi, . . . , Wd\ and the unitary matrices 
{t/i, . . . , Ud}- Given a unitary matrix V define the transformation Ty : IZdiji) 7^d(n) by 

Ty : {wi, . . . , ly^; [/i, . . . , f/^} ^ {tfi, . . . , Wd; UiV, UdV} (5.13) 
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The measure Vd,n = i^d,n x if„ x ■ ■ ■ x Hn contains the product of d independent copies of Haar 
measure if„ on the group Since Haar measure is invariant under group multiphcation, for any 
event C C TZdiji) we have 



Vd,n{C) = / dUd,n{Wi,...,Wd)dHn{Ui)---dHn{Un) 

Jc 

= / dl'd,n{'Wl,...,Wd)dHn{Ui)---dHn{Un) 

Jtv(C) 

= Vd,n{Tv{C)) (5.14) 
Thus in particular for any z G V„, 

Vd,n{^ : ^^\zz*) G a} = Vd,n{^ : ^""{VziVz)*) G (5.15) 

Since W„ acts transitively on V„, (15.151) shows that the probability is independent of z. Hence from 
(I5.12P we obtain that for any fixed Zq &Vn, 

H*{Vd,n X an){A) = Vd,n{<^ : $''(^0^0) e a} (5.16) 

For a given channel $ the d x d matrix ^'-^{zqZq) can be written in terms of a n x matrix 
K{^) as follows: 

^''{zoZ*) = K{^yK{^), K{^) = {^m ■■■ ^Vd) (5.17) 
where for i = 1, . . . , ci 



Vi = U,Zo (5.18) 

Thus fl5.16l) can be written as 

H*{Vd,n X a„)(A) = Vd,n{<^ : K{<I>)*K{<^) G (5.19) 

Recall from (13.21) that G{z) = M{z)*M{z) where z G Vnd and where the nx d matrix M{z) has 
entries 

Mij{z) = Z(^i_i)d+j, i = l,...n, j = l,...,d (5.20) 
It follows that for any event A C M.d, 

G*icXdn)iA) = adn\z : M{zyM{z) G a] (5.21) 



We wish to prove that H*{Vd,n x On){A) = G*{odn){A) for every event A C Md- Comparing 
(I5.2ip and (I5.19p . it is sufficient to show that the d x d matrices i^($) and M{z) have the same 
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distribution. We will do this by showing that the columns of K{^) and M{z) have the same joint 
distribution. Before showing the result, we need the following observation on a normally distributed 
vector. 

Let Zi, . . . , Zm be IID complex valued normal random variables with mean zero and variance 
one. Let R = ^/Y^^^i l-^iP define the vector 



1 ' ^ 



(5.22) 



\Zm J 

Then i?, ^ are independent, is a random pure state in Vm, and R has density 

P(r) oc e-'-'/V^"^-! (5.23) 
This result may be easily seen by transforming the joint density for Zi, . . . , Zm to polar coordinates: 



m 



(27r)-™ W e-l"'l'/2 ^2^^ _ 7r-™e-"'/2 ^2m-i ^5 24) 

where dVt is the uniform measure on S*^™"^. 

We look at M{z) first. Let {Zij} {i = 1, . . . ,n, j = 1, . . . ,d) be a collection of IID complex 
valued normal random variables with mean zero and variance one, arranged into a n x d matrix Z 
as in (15.201) . Applying the previous observation to Z and also to each column of Z yields 

Z = RM={R^Ci ... R^^ (5.25) 

Here {^1, . . . ,C,d} are IID random unit vectors in V„, and M is a random unit vector in Vnd- The 
vectors {^1, . . . , ^d} are independent of the numbers . . . , Rd- Also R^ = Rf + ■ ■ ■ + R^, hence 
{^1, . . . , ^d} are also independent of R. Dividing by R, M{z), a random unit vector in Vnd, can be 
reconstructed as 

M(z) = (v/TT^i ■■■ VYd^, F. = || (5.26) 

Note that Yi, . . . , have the joint distribution i'd,n, and are independent of {^1, . . . , ^^j. 
Next, turning into K[^), recall that 

K{^) = {^,v^ ■ ■ ■ ^Vd) (5.27) 



where Vi = UiZ^. Since the unitaries Ui are independently and uniformly selected (this is part of 
the definition of the measure Vd,n), it follows that the vectors {vi} are IID random unit vectors in 
V„. Furthermore the coefficients {wi} have the joint distribution u^^n- This verifies our claim. 
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5.4 Proof of Lemma [9] 

Define the following subset of 7ld{n) x Vn- 

K={{^,z) : ^""{zz*) i Ba{n)] = H-\Ba{ny) (5.28) 

where the map H was defined in (13.171) . Then 

{Vd,n X an){K) = E<,K(^ : ^""{zz*) i Bd{n))\ 

> E^[1tc an{z : $^(22*) ^ Bd{n))\ (5.29) 

where E$ denotes expectation over 7ld{n) with respect to the measure Vd,n, and It^ is the char- 
acteristic function of the event T". Note that if $ G T"" then an{z : ^'^{zz*) ^ Bd{n)) > 1/2, 
hence 

(Pd,nxa„)(i^) > ^E^[lT^] = ^Vd,niT') (5.30) 

Furthermore from Lemma [7] it follows that 

(Pd,nxa„)(ir) = {Vd,ny<(Tn){H-\Bd{ny)) 
= H*{Vd,n^On){Bd{nr) 

= G*{adn){Bd{nr) (5.31) 



Combining these bounds shows that 

Vd,n{Tl < 2G*{adn){Vd{n) 



/ log n 

2/Ud,n{(gi, • • • ,gd) : \qi-l/d\> h\ some z = 1, . . . , 

n 



= 2/id,„(|jL,j (5.32) 

i=l 

where the events Li are defined by Lj = {(gi, . . . ,qd) : \qi — I / d\ > b^/logn/n}. Thus we have 

d 

VdATl < 2^/irf,„(L,) = 2dfid,n{L^) (5.33) 



i=l 



We use the bound (13.151) of Corollary [6] with t = b^y\ogn/n to estimate ^d,n{Li)- In addition 
we assume that n is large enough so that 



dt = dhxl"^^ <l (5.34) 
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and hence 



Thus (15.321) gives 



2 3! - 3 



(5.35) 



VdATl < 2rfexp[rf2iog^_iog(d_i)!_!L^rf262l°^^^ 



n 



2d 
Jd^) 



exp 



d'^ log n 



b\n - d) 
3n 



(5.36) 



5.5 Proof of Lemma [TT] 

This result relies on several properties of random states. We will switch to Dirac bra and ket 
notation throughout this Section, as it lends itself well to the arguments used in the proof. To 
set up the notation, let be a fixed state in V„, and let \6) be a random pure state in V„, with 
probability distribution cr„. Without loss of generality we assume that a basis is chosen so that 
\ip) = (1, 0, ... , 0)^. We write x = {ip\6), and let |0) be the state orthogonal to \ip) such that 

1^) = X 1^) + - (5.37) 

Thus 10) is also a random state, defined by its relation to the uniformly random state \9) in (I5.37p . 
The following results are proved in Appendix B. 

Proposition 14 x and |0) are independent. |0) is a random vector in V„_i with distribution 0"„_i. 
For allO<t<l 

^n{|^) : \{m\ = \x\>t} = {l-t'r-' (5.38) 

Proposition [H] implies that as n ^ cx) the overlap x = {ip\0) becomes concentrated around 
zero. In other words, with high probability a randomly chosen state will be almost orthogonal to 
any fixed state. As a consequence, from (I5.37P it follows that |0) will be almost equal to 16*). This 
statement is made precise by noting that 

111^) -I0)lloo= 1(^1^)1 (5.39) 
Then (15.381) immediately implies that 

^n(l^) : |||^^)-|0)||oc>t) = (l-t'r-' (5.40) 



The second property relies on the particular form of the random unitary channel, or more 
precisely on the form of the complementary channel Roughly, this property says that for any 
fixed random unitary channel $ and random state |^), with high probability the norm of the matrix 
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{\6) {ipD is small, and approaches zero as n — oo. We will prove the following bound: for any 
$ e 7^d(n), and for all < t < 1, 



c 



2>t)<d'{l-t') 



2\n-l 



(5.41) 



As a first step toward deriving (15.411) . note that for any states \u) and |f ), 
Il'^'''(|w)(^l)ll2= iY,wuWi\{v\U:Uk\u)A <m^^\{v\U;Uk\u)\, 

\k,l=l J 

In particular this implies that 



(5.42) 



$^(|n)(t;|)||2<max{|||u)|U, |||w)||oo} 



(5.43) 



To derive (15.411) we apply (I5.42p with u = 9 and v = if) and deduce that 

a^{\e) ■ ||<l>^(|e)(^|)||2>t) < a,n{\e) : max | (^|f/;f/fc|^^) | > t) 



< d'aniie) : mu;Uk\e)\>t) 



(5.44) 



where the last equality follows from (I5.38p . 

With these ingredients in place the proof of Lemma [11] can proceed. By assumption $ is a 
random unitary channel belonging to the typical set T, and p = (lip) is some state in 
Im($'"). Let \6) be a random input state, then as in (I5.37P we write 



\e) = x\ij) + ^/i-\x\^\(j)) 

It follows that 

\9){9\ = \x\' + (1 - |x|2) + vT^N^(^ +x |0)(V^|) 
Write r = then (15.450 yields 



(5.45) 



1 



^^^miei) -(r<|.^(|^)(^|) + (l-r)-/ 



1-r) 



- + Vr(l-r)$^ (e^« + e'^^ (5.46) 



where ^ is the phase of x. Since r < 1 this implies 



1 



$^(|^)(^|)-(r<|.^(|^)(^|) + (l-r)-/ 



< 



$^(|z^)(0|) 



(5.47) 
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Referring to the definition (14.61) of Tube(p), recall that (|6')(6'|) belongs to Tube(p) if and only 
if for some r satisfying 7 < r < 1, 



^^{\e){e\)-[r^^m{^lj\) + ii-r)-i 



< t 



dlogn 



n 



(5.48) 



(the set Y{p) defined in (14.51) is closed so the infimum in (14.61) is achieved). Define the following 
three events in Vn- 



A, = {\9) : r=\{m\'>l} 



A, 



\0) : 



<^>'^(I0)(0I)-/ 



^ 2 dlogd ^ ^ /dlogn 



n 



n 



$^(|^)(0|) 



< 2 



dlogd 



n 



Since t > 6 + 4 and d <n^it follows from (15.471) and (I5.48P that 

n n A3 c {|^^) : ^''{\e){e\) g Tube(p)} 

Furthermore by Proposition [TH Ai is independent of A2 and ^3, hence 

an{^^{\e){e\) e Tube(p)) > (T„(Ai n n A^) = a„(Ai) cXn{A2 n A3) 

Proposition [T^ immediately yields 

a„(Ai) = (i-7r-' 

From (I5.53P this gives 

a„($^(|^)(^|) G Tube(p)) > (1 - 7)""^ (1 - a„(A^) - a^D) 
In order to bound ^^(Ag) we first use (15.431) to deduce 

ll*''(l^)(0l)l|oo < ||$^(|^)(0|)||2 < ||$^(|^)(^^|)||2 + 111^^) - 

Thus 



a„(A 



> 2 



dlogd 



n 



< aMO) : ||$^(|^)(^^|)||2+|||^^) 



> 2 



dlogd 



n 



< aM9) : \\^''mm\2>\r^\+<^n{\e) : |||^^) 



n 



> 



< {d^ + l){l- 



n 



(5.49) 
(5.50) 

(5.51) 

(5.52) 
(5.53) 

(5.54) 
(5.55) 
(5.56) 



dlogd 



n 



(5.57) 
(5.58) 
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where the last inequahty follows from (I5.44p and (15.401) . 
Turning now to ^^(Ag), note first that 



< 



< 



<!>^m{4>\)-'^''m{o\) 
<i.^(i0)(0i)-<i.^(i^^)(ei) 

< 2|||^^)-|0)|| 

where we used f l5.43p for the last inequality. As in fl5.57p this gives 
a^iA^) = aj\9) : 



<^^i\9){9\)-^-I 



1 



1 



(5.59) 



<^'^(I0)(0I)- V 



< <Jn\\e) : 2\\\9)- 

n-l 



d 



> 2 



, d log d , d log n 
> 2 \l — + b ' 



n 



n 



dlogd 



n 



> b 



dlogn 



n 



< 1 



dlogd 



n 



+ <yn{\e) : 



> b 



dlogn 



n 



(5.60) 



where we used (I5.40p for the last inequality. By assumption $ G T, and therefore there is a set of 
input states L with cTniL) > 1/2 such that 



\9)eL 



<^^i\9){9\)-^I 



< b 



dlogn 



n 



Thus 



> b 



dlogn 



n 



(5.61) 



(5.62) 



Putting together the bounds (15.551) . (I5.57p . (15.601) and (15.621) we get 

a„($'^(|^^)(^^|)GTube(p)) > (1 - 7)"-! (^1 - a„(A^) - a„(A^)^ 

> (1-7)"-' 



1-1 



dlogrf 



n-l 



l-id' + l) (1- 



n 



This completes the proof, with 



1 1 / ,9 N / dlogd\^-^ 



(ilogfi 

(5.63) 
(5.64) 
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5.6 Proof of Lemma [12 



It is clear that f{rx + 1 — r) is monotone increasing in r, and therefore 

sup sup — — = sup — — (5.65) 

a;>0 7<r<l /(rX + 1 - r) r,>o f {-fX + 1 - -f) 

The function f{x) f{'~fx + 1 — 7)^^ is analytic and decreasing at x = 1 for 7 < 1. Thus either 
the supremum in (15.651) is achieved at a; = or else there is a critical point of the function 
f{x) /(7X + 1 — 7)"^ in the interval (0, 00). In order to rule out the second possibility, we introduce 
a Lagrange multiplier and define the function 

hi^, y, P) = log fix) - log /(y) - (3{-fx + 1 - 7 - y) (5.66) 

To find the critical points of h we solve 



dh dh dh 

dx dy d(3 



(5.67) 



Solving for (3 leads to 



f{x) ' fiy) 
Since y — 1 = -^{x — 1) this is equivalent to 

(x-l)logx {y-l)\ogy 



^''^> 7^ (5.68) 



X log X — X + 1 y log y — y + 1 

Direct computation shows that 

d f (x-l)logx ^ ^-2((^ N2 

- f[x) (logx) 



(5.69) 



dx \x log X — X + 1 I V X 



.™l/2 _ ™-l/2. 2 

—2 /I \2 



= m-H\ogxy^i-[ ^^^^ ) j (5.70) 

Furthermore, the function x^^^ — — logx is monotone increasing for all x > 0, and thus 

x^/^ — x~^/^ > logx for X > 1. Thus for x > 1 the derivative (15.701) is negative, and therefore (15.691) 
has no solution with x > 1. Similarly x^/^ — x~^/^ < logx for < x < 1, and hence again (15.701) 
is negative for < x < 1. So there are no solutions of (15.691) except x = y = 1. Therefore (15.661) 
has no critical points except x = y = 1, and thus the function /(x)/(7x + 1—7)"^ achieves its 
supremum at x = 0. 
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5.7 Proof of Lemma [13 



(5.71) 



Suppose first that < h < dlogd. Recall the definition 

M,{h) = I J2F{q,d) : J^/M) > hi 

^ i=l i=l ' 

where F[x) = — logx + x — 1 and f{x) = xlogx — x + 1. Letting Xj = qid we have 

f d d a! s 

M,{h) =mf I : 5^/(a;,)>/i, J^x, = rfl (5.72) 

^ i=l i=l i=l ■' 

The gradient of the function Ylt=i is zero only at Xi = ■ ■ ■ = = l/d, hence since /i > 

there are no critical points of X^iLi F{xi) in the region ^2^=1 fi^i) — ^- Thus the infimum in (15.721) 
is achieved at the boundary where Yl'i=i fi^i) = ^! so 

C d d d > 

M,(/i) = jnfj ^F(x,) : ^/(x,)=/i, J^x, = cil (5.73) 

i=l i=l i=l ^ 

We introduce Lagrange multipliers and define 

d d d 

H{xi, a,f3) = Y, F'ixi) - « ( /(^^) - /^) - /? ( J] a;, - rf) (5.74) 

i=l i=l i=l 

The critical equations for H are 

1 

— = 1 alogXi-^ = (5.75) 

The constraints can be used to eliminate (3 and obtain 

^1 + a— jxj — 1 = axj logXj, i = l^...,d (5.76) 

If a < the equations f l5.76p have the unique solution Xj = 1 for alH = 1, . . . , d. However this does 
not satisfy the constraint X^iLi fi^i) = h for h > Q. Thus a > 0, in which case there are positive 
numbers w and z satisfying 

Q<w<l<z<d (5.77) 

such that the solutions of fl5.76p are 

xi = ■ ■ ■ = Xk = w, Xfc+i = ■ ■ ■ = Xd = z (5.78) 
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for some 1 < k < d — 1. The constraint conditions imply that 

kw + {d — k)z = d, kwlogw + {d — k)z\ogz = h (5.79) 
Thus (15.731) can be reformulated as 

Md{h) = inf {—k\ogw — {d — k)\ogz : kwlogw + {d — k)z\ogz = h, 

0<w<z, l<k<d—l 

kw + {d- k)z = d} (5.80) 

We claim that —k log w — {d — k) log z, subject to the constraints kw log w + {d — k)z log z = h and 
kw + {d — k)z = d, is a decreasing function of k. In order to show this, we divide (I5.80p by d and 
write k = td, and consider the function 

Q{w,z,t) = -tlogw - (l-t) log z (5.81) 

along with the constraints 

twlogw + {l-t)z\ogz = ^, tw+{l-t)z = l (5.82) 

The constraints fl5.82p allow w, z to be defined locally as functions of t. This follows from the 
implicit function theorem since the Jacobian is t{\ — t) log{w/z)] we must have w < z and hence 
the Jacobian is nonzero. Solving these constraint equations for the derivatives gives 



dw —w + z — w log z + w log w 
dt t\og{z/w) 
dz w — z — z log w + z log z 

Itt ~ {l-i)\og{z/w) 

Returning to (15.811) we can now compute its derivative with respect to t: 

dQ t dw 1 — tdz 

— = -logu; + log2; — — 

dt w dt z dt 



(5.83) 
(5.84) 



1 



\og{z/w) 



{z — w) 



zw 



2 . s2 

- - y^og{z/w)j 



(5.85) 



Note that 2 logu < u — 1/u for all u > 1, hence 

log(^)<./^-y^ (5.86) 

and therefore the right side of (15.851) is negative. Thus Q is a decreasing function of t, and hence 
the infimum in (15.801) is achieved at the largest possible value of k, namely k = d — 1. This leads to 

Md(/i) = inf{-log;z- (d-l)log^^ : zlogz + {d - z)\og^^^ = h} (5.87) 

(1 1. (jj X 
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The function z log z^{d — z) log is monotone increasing, reaching its maximum value d log d 
at z = d. Thus for any < h < dlogd there is a unique value z{d,h) satisfying the constraint 
condition in (I5.87p . Its derivative is 

oh dlogz — h 

Furthermore the function g(z) = — log 2; — (d — 1) log is also monotone increasing for 1 < z < d, 
with derivative g'{z) = d{z — l)/z{d — z). Thus 

Md{h)-MdiO) = g{z{d,h))-g{z{d,0)) 

, dz 
g'{z{d, h))-Qj^dh 

d log z — h 
;M diz-l) „ 

,h ^ 

> -dh (5.89) 



The constraint condition in (15.871) implies that 



h < zlogz < h + {d- z) log ^ — - <h + z-l (5.90) 



Thus 



z < — — (5.91) 

-logz-1 ^ ^ 

li h > 2e^ then the first inequality in fl5.90p implies that z{d,h) > e^, and therefore log 2; > 2. 
From (15191]) it follows that 

h>2e^^z{d,h)<h-l (5.92) 
Thus from (15.891) we deduce that for h > 2e^ 

M,{h) - MM > f — > C = \og{h - 1) - log(2e2 - 1) (5.93) 



-'2e2 Z ~ J2ei h - 1 

Since z{d, 0) = 1 and g{l) = it follows that Md(0) = 0, and hence fOD]) holds 
Finally, to show that Md{x) is increasing, note that 



dMd 
dx 



1 d-1 



z d — z 
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dz 
dx 



(5.94) 



where z solves the constraint equation 

z log z + (rf - z) log 

Differentiating (15.951) gives 



X 



(5.95) 



-1 



dx \ '^^^ d — z ' ^ 



(5.96) 



since z > \. Also 



1 _|_ d-l 

2 d— 2 



> hence (15.941) shows that M^^ is increasing. 



6 Discussion 



Hastings' Theorem finally settles the question of additivity of Holevo capacity for quantum channels, 
as well as additivity of minimal output entropy and entanglement of formation. In this paper we 
have explored in detail the proof of Hastings' result, and we have provided some estimates for the 
minimal dimensions necessary in order to find a violation of additivity. The violation of additivity 
seems to be a small effect for this class of models, requiring delicate and explicit estimates for the 
proof. It is an open question whether there are random unitary channels with large violations of 
additivity. Hastings' Theorem is non-constructive, and it would be extremely interesting to find 
explicit channels which demonstrate the effect. Presumably non- additivity of Holevo capacity is 
generic, and there may be other classes of channels where the effect is larger. 

Having established non-additivity of Holevo capacity, one is led to the question of finding useful 
bounds for the channel capacity C($). One may even hope to find a compact 'single-letter' formula 
for C($), though that possibility seems remote. It is likely that the methods introduced by Hastings 
will prove to be useful in addressing these questions. 
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A Derivation of bound for Z(n, d) 

We consider the following integral. 



// \ d 

\ i=l / l<i<j<d k=l 

^ I e-W'^-'dr / q 1 - E P) n (P^ -P^^'Il Pk~'dpk 

'' \ i=\ J i<i<j<d k=l 



{dn 

Consider the following change of variables. 



qd^r{l-pi - .. .-Pd-i) 



The Jacobian is 







r 


■ pi 




r . 


. 


Pi 




■■,<ld) 














d{pi, . . 


,Pd-i,r) 


.. 


• r pd-i 




. 


. r 


Pd-i 






— r . . 


. -r 1 - pi - ... - pd-i 




. 


. 


1 



After the change of variables we have 
However, 



7^;^/ n i^^-^.me-^''<l^'dq,. 

^ l<i<j<d k=l 



l<i<j<d 



1 .. 


. 1 


2 


1 


1 


qi ■■ 


■ Qd 




PliQi) ■ 


• PiiQd) 




■ <lt' 




Pd-i{qi) ■ 


■ Pd-i{qd) 



(A.l) 
(A.2) 



(A.3) 

(A.4) 
(A.5) 
(A.6) 



(A.7) 



(A.8) 



(A.9) 



Here, Pk is any monic polynomial of degree k. So, set Pk — {—l)'^k\L^ where is the Laguerre 
polynomial. Then, we have 



J Pk{x)pi{x) e''x"-''dx = dkiT{n -d + k + l)T{k + 1) 



(A.IO) 
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Hence 

2 



{dn-l)\ 



^ ~ i=l / fc=l 



l[r{n-d + k + i)r{k + 1) (A.13) 



A:=0 

d 



^nr(„-<i + w. + i). 



(A.14) 



Here, a are "permutations" ; a : {1, . . . ,d} ^ {0, . . . ,d — 1}. 

To evaluate this quantity we use the following fact: r(s) is approximated by 

exp logs — s — ^ logs + log V2n + log ^1 + ^"pp"^ ^ ^ (A. 15) 

as s +00. Then, we have 

exp{(s — 1) log s — s} < ( 1A.15I) < exp{s log s — s}. (A. 16) 

Note that the above upper bound is true only for large enough s but in our case it is not a problem. 
By using these bounds we get a lower bound for ( 1A.14I) . 
First, log{l/r{dn))) is lower bounded by 

— {dn) \og{dn) + dn = —dnlogd — dnlogn + dn. (A. 17) 

Secondly, log(nfc=i ^{n — d + k)) is lower bounded by 

d d 
^(n -d + k-1) \og{n - d + k) - ^{n - d + k) (A.18) 

k=l k=l 

= n^y ^ ^-d + k-l / n-d + k \ ^ _ ^ + ^ _ i^g^ _ y -d + k). (A.19) 



The first sum in (lA.19p is approximately lower bounded by 

X - X - — - log (- — -] = d{n - d){\og{n - d) - \ogn) ^ -d^, (A.20) 



n n \ n 
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as n — s> oo. Also, the remaining part in (lA.lQp is 



1.2 1 



2 2 

Thirdly, log(nfc=i + l)) lower bounded by 



dn — —d — —d] \ogn — [ dn — -d + -d] . 



1.2 1 



(A.21) 



j2k\og{k+i)-j2^k+i] 



k=l 



k=l 



k=l ^ ' k=l k=l 



(A.22) 
(A.23) 



Again, the first term in (1A.23P is approximately lower bounded by —d"^. Also, the remaining part 
in f[03|) is 



Id^ + ^dj \ogd- (^d^ + h). 



'-he - 








U 


logn + 


2 


2 



As a whole, we know that the inside of exp in ( 1A.14P is lower bounded by 
(ICTD + (IX2B + (1X241) - 2rf2 

-dn + -d^ + -d log d-2d^ - 2d 
2 2 J ^ 

= -d^ log n + {d^ - dn) log d + ^d{d - 1) (log n - log d - 4) 
> ~d^ log n+ [cP — dn) log d 
if (logn — logfi) > 4. Therefore, we get an upper bound for the normalization constant: 

in this case. 



(A.24) 

(A.25) 
(A.26) 

(A.27) 
(A.28) 

(A.29) 



B Proof of Proposition U 



Let Zi, . . . , Z„ be IID complex Gaussian random vectors with mean zero and variance one. Apply 
the result fl5.22l) with m = — 1 to deduce that 



k-^2, • • • , Zn) 



P\ 



(B.l) 



where 10) is a random unit vector in C" ^ ^ C", independent of p = (1^2^ + ■ ■ ■ + \Zn\^)^^'^- Then 
apply fl5.22p with m = to deduce 



! • • • ; Zn) 



R\e) 



(B.2) 
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where 16*) is a random unit vector in C", independent of i? = + ■ ■ ■ + Let 

x = ^= , (B.3) 

and recall that I'?/') = (1, 0, ... , 0)"^. Then 

1 /V V NT _ ^, ,.\ , P 



1^^) = - (Zi, . . . , = + ^ 10) = + v/r^N^ (B.4) 

This proves the first part Proposition [14] since |0) is independent of Zi and p, and hence is inde- 
pendent of X. 

To prove the second part of Proposition [H] we use the representation (IB.Sp to derive the distri- 
bution of We write Zj = X2j-i + iX2j where {Xj} are IID real normal random variables with 
mean zero and variance one, so from fIB.SP it follows that 

Note that Xf + . . . + X| has the following Chi-square probability distribution: 

fk{x) = — r— — x^~^e~^ (B.6) 

^ ^ 2lr(|) ^ ^ 

Hence, set 

X = X^ + X^ (B.7) 

y = X| + ...+X22„ (B.8) 

and then X and Y are independent and have the following probability distributions 

Mx) = ie-i (B,9) 



However, 



implies that the cumulative function of ^Tv is 



X + Y 

X+Y 



<t^ X{l-t) <tY (B.ll) 



Jo \L ^^("^)^^)^^(^)^^ = /, [l - e-^) fy{y) dy (B.12) 



1- / e-^fYiy)dy (B.13) 



1 / (B.14) 
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Here, 

/■oo /-oo 

/ y"-2e-5(fey = (2(1 - (n - 2)! / dy (B.15) 
Jo Jo 

= 2'*-i(l -t)"-i(n-2)!. (B.16) 



Therefore 



F^{t) = 1 - {1 - tr-\ (B.17) 
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